{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c807393a",
   "metadata": {},
   "source": [
    "# hvplot.plotting.scatter_matrix\n",
    "\n",
    "```{eval-rst}\n",
    ".. currentmodule:: hvplot.plotting\n",
    "\n",
    ".. autofunction:: scatter_matrix\n",
    "```\n",
    "\n",
    "## Examples\n",
    "\n",
    "### Basic scatter matrix plot\n",
    "\n",
    "This example shows how to create a simple scatter matrix plot from a DataFrame. The plot includes all the columns of the DataFrame.\n",
    "\n",
    ":::{tip}\n",
    "Enable the *Box Select* tool and select an area on one of the off-diagonal scatter plots, you will see that a selection is automatically displayed on the other scatter plots, allowing to better explore and understand the dataset. This feature is called linked brushing (read more about it in this [HoloViews user guide](https://holoviews.org/user_guide/Linked_Brushing.html)) and does not require a live Python process, it is purely a front-end functionality. Note that it also works with the *Lasso Select* tool.\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c32e0143-1ea1-45ac-a87b-afb9f127ec73",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(np.random.randn(200, 4), columns=['A','B','C','D'])\n",
    "\n",
    "hvplot.plotting.scatter_matrix(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3687d2eb-78dc-46c8-b23a-a757cf3cfdb3",
   "metadata": {},
   "source": [
    "### Customize the off-diagonal plot types\n",
    "\n",
    "The `chart` keyword allows to change the type of the *off-diagonal* plots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8713de1f-3174-4d6a-b978-47be14c07d06",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(np.random.randn(100, 2), columns=['A','B'])\n",
    "\n",
    "hvplot.plotting.scatter_matrix(df, chart='bivariate') +\\\n",
    "hvplot.plotting.scatter_matrix(df, chart='hexbin')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc1cf6f2-8725-46d5-9a02-86ccb51a76d4",
   "metadata": {},
   "source": [
    "### Customize the off-diagonal plot types\n",
    "\n",
    "The `diagonal` parameter allows to change the type of the *diagonal* plots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc1a72d3-4989-4772-b1ff-ba2b277054b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(np.random.randn(100, 2), columns=['A','B'])\n",
    "\n",
    "hvplot.plotting.scatter_matrix(df, diagonal='kde')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7f1ae2b-b54c-4aa7-aabb-1b1a483473ac",
   "metadata": {},
   "source": [
    "### Control the tools\n",
    "\n",
    "Setting `tools` to include a selection tool like `box_select` and an inspection tool like `hover` permits further analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4eb30670-c1c4-4a21-98a8-9e96ede30a1b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(np.random.randn(100, 2), columns=['A','B'])\n",
    "\n",
    "hvplot.plotting.scatter_matrix(df, tools=['box_select', 'hover'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a13c21fb-652a-4102-8b8c-a24ebae7239b",
   "metadata": {},
   "source": [
    "### Color the data per group\n",
    "\n",
    "The `c` parameter allows to colorize the data by a given column, here by `'CAT'`. Note also that the `diagonal_kwds` parameter (equivalent to `hist_kwds` in this case or `density_kwds` for *kde* plots) allow to customize the diagonal plots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30a0daa5-b488-443a-ae06-e873e8c0e6bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(np.random.randn(100, 2), columns=['A','B'])\n",
    "df['CAT'] = np.random.choice(['X', 'Y', 'Z'], len(df))\n",
    "\n",
    "hvplot.plotting.scatter_matrix(df, c='CAT', diagonal_kwds=dict(alpha=0.3))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52e61d90-2c69-425e-8d32-a7592b340c9b",
   "metadata": {},
   "source": [
    "### Display large data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e02492fa-2b9d-4080-9050-fec08c292352",
   "metadata": {},
   "source": [
    "Scatter matrix plots may end up with a large number of points having to be rendered which can be challenging for the browser or even just crash it. In that case you should consider setting to `True` the [`rasterize`](option-rasterize) (or [`datashade`](option-datashade)) option that uses [Datashader](https://datashader.org/) to render the off-diagonal plots on the backend and then send more efficient image-based representations to the browser.\n",
    "\n",
    "The following scatter matrix plot has 1,200,000 (12x100,000) points that are rendered efficiently by Datashader."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a2a15ee-c271-45e1-ad6d-236d1e59ff2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(np.random.randn(100_000, 4), columns=['A','B','C','D'])\n",
    "\n",
    "hvplot.plotting.scatter_matrix(df, rasterize=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7c920a4-e1a0-4475-bfda-426ee0c1613d",
   "metadata": {},
   "source": [
    "When `rasterize` (or `datashade`) is toggled it's possible to make individual points more visible by setting `dynspread=True` or `spread=True`. Head over to the [Working with large data using datashader](https://holoviews.org/user_guide/Large_Data.html) guide of HoloViews to learn more about these operations and what parameters they accept (which can be passed as `kwds` to `scatter_matrix`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d869d458-ff8c-4b66-b177-fe9c5197d9b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hvplot\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(np.random.randn(100_000, 4), columns=['A','B','C','D'])\n",
    "\n",
    "hvplot.plotting.scatter_matrix(df, rasterize=True, dynspread=True)"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
